Longitudinally monitored immune biomarkers predict the timing of COVID-19 outcomes

The clinical outcome of SARS-CoV-2 infection varies widely between individuals. Machine learning models can support decision making in healthcare by assessing fatality risk in patients that do not yet show severe signs of COVID-19. Most predictive models rely on static demographic features and clinical values obtained upon hospitalization. However, time-dependent biomarkers associated with COVID-19 severity, such as antibody titers, can substantially contribute to the development of more accurate outcome models. Here we show that models trained on immune biomarkers, longitudinally monitored throughout hospitalization, predicted mortality and were more accurate than models based on demographic and clinical data upon hospital admission. Our best-performing predictive models were based on the temporal analysis of anti-SARS-CoV-2 Spike IgG titers, white blood cell (WBC), neutrophil and lymphocyte counts. These biomarkers, together with C-reactive protein and blood urea nitrogen levels, were found to correlate with severity of disease and mortality in a time-dependent manner. Shapley additive explanations of our model revealed the higher predictive value of day post-symptom onset (PSO) as hospitalization progresses and showed how immune biomarkers contribute to predict mortality. In sum, we demonstrate that the kinetics of immune biomarkers can inform clinical models to serve as a powerful monitoring tool for predicting fatality risk in hospitalized COVID-19 patients, underscoring the importance of contextualizing clinical parameters according to their time post-symptom onset.


Logistic Regression
We trained a logistic regression model using the L-BFGS method with a maximum iteration count set to 10000. During training, we used L 2 regularization with a regularization coefficient of .1.

Random Forest
We trained a random forest consisting of 50 trees trained using the Gini Impurity measure to determine tree splits.

Neural Network
We trained a neural network with one hidden layers consisting of 50 neurons. We used the ReLu (rectified linear-unit)activation function and trained the network using the Adam solver with a maximum of 1000 iterations. We also included a regularization parameter of .1.

Dataset
We formulated the task of predicting future mortality of a patient as a sequence prediction problem: given all information up to time t, we seek to predict whether a desired outcome will occur within a time interval (t, t + k] where k determines the window length. We model the trajectory of patient i as a sequence of tuples (x i,t , y k i,t ) i where (x t ) and y k t correspond to the examples and labels at time t. x i,t is a vector consisting of the following (normalized) data points: -log EC50, Neutrophils, Lymphocytes, White Blood Cells, and day of sampling.
y k i,t is determined as follows: Let d i,t represent whether or not a patient has passed away before time t. If patient i has passed away before or on time t, then d i,t = 1, otherwise d i,t = 0. We set y k i,t = 1 if any of [d i,t , d i,t+1 , ..., d i,t+k ] are nonzero (otherwise y k i,t = 0). In otherwords, it represents whether or not a patient will die within the next k days.

Model
The goal of our model is to predict y k i,t given the history of past events . Given the scale of training data available, we chose to simplify this problem and only consider data from the current time step. Hence, we seek to predict y k i,t from x i,t . To do this, we evaluated three types of classifiers: random forest, logistic regression, and neural network.
We performed 100 training runs for each model, each time randomly sampling a .75:.25 train/test split and randomly oversampling the minority class to balance the distribution of labels in the training set. We evaluated the performance of each model on its test set and used that information to generate receiver-operating characteristics and precision-recall curves. These curves were then averaged together to generate mean estimates and also provide an estimate 2 of the variance in our models. The drawn ROC curves represented the means plus-or-minus 2 standard errors of the mean.
Lastly, for the neural network models, we estimated Shapley values for each sample using the python package shap. In order to get a consistent estimate for Shapley values across training runs, we averaged Shapely values for each sample across each experiment that the sample showed up in the training set.

Logistic Regression
We trained a logistic regression model using the L-BFGS method with a maximum iteration count set to 10000. During training, we used L 2 regularization with a regularization coefficient of .1.

Random Forest
We trained a random forest consisting of 1000 trees trained using the Gini Impurity measure to determine tree splits.

Neural Network
We trained a neural network with two hidden layers consisting of 100 and 50 neurons respectively. We used the ReLu (rectified linear-unit)activation function and trained the network using the Adam solver with a maximum of 1000 iterations. We also included a regularization parameter of .5.

Logistic Regression
This is a model with performs binary classification by fitting the data to a logit function which takes values between 0 and 1. The logit function in our case is defined as l k i,t = log(y k i,t /(1 − y k i,t )). We perform a linear regression of l k i,t on x i,t to get the following l = β 0 + β 1 · X where l, x ∈ R N × R T are matrix representations of l k i,t and x i,t across all N samples and T time points, and β 0 and β 1 are regression parameters [5]. There are a variety of methods for performing this regression in a statistically efficient manner. The method we used throughout this paper is called L-BFGS [6], which is a memory efficient adaptation of the Broyden-Fletcher-Goldfarb-Shanno algorithm which falls under the category of iterative methods. Briefly, the algorithm iteratively updates parameters based on a search direction which is informed by an estimate of the hessian of the objective function. This continues until a convergence criterion is met or a maximum number of iterations passes. Additionally, we use l 2 regularization to reduce the variance of our regression in exchange for biasing our parameter estimates towards 0 [5]. Essentially, this means that we add a term λ||β|| l2 , where || · || l 2 is the l 2 norm, and β is a matrix concatenation of β 0 and β 1 .

Random Forest
This model is an ensemble of decision tree classifiers [5]. The prediction of the model is determined by a majority vote of the predictions of the individual trees. Each tree is trained using an algorithm called CART (Classification and Regression Trees) [7] which iteratively splits trees into branches based on some information criteria. For our work, we used the Gini impurity index.

Neural Network
Neural networks consist of a sequence of layers, where the output of one layer x i becomes the input to another layer x i+1 as follows: where w i and β i are model parameters which represent neurons within the network, and σ is an activation function which roughly binarizes its input [3]. In our work, we use the ReLu (rectified linear unit) activation function defined as: where χ is an indicator function. We also train our network using the Adam optimizer [8]. This method falls under a family of optimization techniques based on gradient descent. Essentially, the optimizer moves in the opposite direction of the gradient of the objective function towards a local minima. In the case of Adam, there is an additional momentum term that enables the algorithm to avoid poor local minima. Like with the logistic regression, we also include l 2 regularization to reduce the variance of our models.